Skip to content

fix(eval): resolve mlflow_resource_arn in _get_base_template_context#5758

Open
mollyheamazon wants to merge 2 commits intoaws:masterfrom
mollyheamazon:fix/mlflow-eval-default
Open

fix(eval): resolve mlflow_resource_arn in _get_base_template_context#5758
mollyheamazon wants to merge 2 commits intoaws:masterfrom
mollyheamazon:fix/mlflow-eval-default

Conversation

@mollyheamazon
Copy link
Copy Markdown
Contributor

@mollyheamazon mollyheamazon commented Apr 14, 2026

Problem

BaseEvaluator has a Pydantic validator that auto-resolves mlflow_resource_arn at construction time. However, if sagemaker_session is None when the object is created, the validator exits early and the ARN is never resolved — leaving MLflow tracking silently disabled even when a default tracking server exists in the account.

Fix

Add a deferred resolution fallback in _get_base_template_context: if mlflow_resource_arn is still None when evaluate() is called and a session is available, resolve it then using the same _resolve_mlflow_resource_arn utility used by training.

Also promotes the import of _resolve_mlflow_resource_arn to module level in base_evaluator.py (was previously an inline import inside the validator).

Changes

  • base_evaluator.py: module-level import + 3-line deferred resolution in _get_base_template_context
  • test_base_evaluator.py: one new unit test covering the deferred resolution path

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Copy link
Copy Markdown
Collaborator

@sagemaker-bot sagemaker-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Code Review

The fix correctly addresses the deferred resolution of mlflow_resource_arn when sagemaker_session is None at construction time. The approach is sound and the test covers the new path. A few minor issues: a missing blank line after the import, the test line length likely exceeds 100 characters, and the test could be slightly more robust in verifying the mock call count.

@@ -21,6 +21,7 @@
from sagemaker.core.helper.session_helper import Session

from sagemaker.train.base_trainer import BaseTrainer
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing blank line between this import and the # Module-level logger comment. PEP 8 and the existing file style expect a blank line separating import groups from module-level code.

from sagemaker.train.common_utils.finetune_utils import _resolve_mlflow_resource_arn

# Module-level logger

Returns:
dict: Base template context dictionary
"""
# Resolve MLflow ARN if not already resolved (e.g. session was None at construction time)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor consideration: _resolve_mlflow_resource_arn may raise exceptions (e.g., if the SageMaker API call fails). In the validator, failures are silently swallowed because the validator just returns None. Here, an exception would propagate up and fail evaluate(). Is that the desired behavior? If not, consider wrapping this in a try/except with a warning log, consistent with how the validator handles failures:

if not self.mlflow_resource_arn and self.sagemaker_session:
    try:
        self.mlflow_resource_arn = _resolve_mlflow_resource_arn(self.sagemaker_session)
    except Exception:
        _logger.warning("Failed to resolve MLflow resource ARN during deferred resolution.")

@@ -925,6 +925,37 @@ def test_get_base_template_context(self, mock_resolve, mock_session, mock_model_
assert context['dataset_artifact_arn'] == DEFAULT_ARTIFACT_ARN
assert 'action_arn_prefix' in context

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is well over 100 characters (the project's line length limit). Please wrap it:

    @patch(
        "sagemaker.train.evaluate.base_evaluator._resolve_mlflow_resource_arn"
    )
    def test_get_base_template_context_deferred_mlflow_resolution(
        self, mock_resolve_mlflow, mock_resolve, mock_session, mock_model_info
    ):

region=DEFAULT_REGION,
account_id="123456789012",
model_package_group_arn=DEFAULT_MODEL_PACKAGE_GROUP_ARN,
resolved_model_artifact_arn=DEFAULT_ARTIFACT_ARN,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using mock_resolve_mlflow.assert_called_once_with(mock_session) instead of assert_called_with to verify it was called exactly once during _get_base_template_context. Since the mock is also invoked by the validator during construction, you may want to reset the mock after construction (mock_resolve_mlflow.reset_mock()) before calling _get_base_template_context, then assert assert_called_once_with. This makes the test more precise about which call path triggered the resolution.

model=DEFAULT_MODEL,
s3_output_path=DEFAULT_S3_OUTPUT,
model_package_group=DEFAULT_MODEL_PACKAGE_GROUP_ARN,
sagemaker_session=mock_session,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting evaluator.mlflow_resource_arn = None directly after construction to simulate the deferred case works, but it's a bit fragile — it relies on Pydantic allowing direct attribute mutation. A comment explaining why this is necessary (or configuring the mock to return None only during construction) would improve clarity. Actually, you already set mock_resolve_mlflow.return_value = None before construction, so the validator should already leave it as None. Is this explicit assignment redundant? If so, removing it would simplify the test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants